Processing galois field arithmetic

ABSTRACT

Efficient parallel processing of algorithms involving Galois Field arithmetic use data slicing techniques to execute arithmetic operations on a computing hardware having SIMD architectures. A W-bit wide word computer capable of operating on one or more sets of k-bit operands executes Galois Field arithmetic by mapping arithmetic operations of Galois Field GF(2 n ) to corresponding operations in subfields lower order (m&lt;n), which one selected on the basis of an appropriate cost function. These corresponding operations are able to be simultaneously executed on the W-bit wide computer such that the results of the arithmetic operations in Galois Field GF(2 n ) are obtained in k/W as many cycles of the W-bit computer compared with execution of the corresponding operations on a k-bit computer.

FIELD OF THE INVENTION

[0001] The invention relates to processing algorithms involving GaloisField arithmetic and relates particularly, though not exclusively, tothe efficient execution of algorithms involving Galois Field arithmetic,as typically found in communications and cryptography applications.

BACKGROUND

[0002] A Galois Field is a finite set of elements in which addition,subtraction, multiplication and division (all appropriately defined) canbe performed without leaving the set. Addition and multiplication mustsatisfy the commutative, associative and distributive laws. Galois Fieldarithmetic finds wide use in a variety of engineering applications,including error correcting codes and cryptography. For a concise andcomprehensive exposition of Galois Fields, refer to Lidl andNiederreiter, Introduction to Finite Fields and Their Applications,Cambridge University Press, Cambridge, Mass., 1986.

[0003] In view of the varied applications noted above, there has beenconsiderable attention given to efficient methods and apparatuses forGalois Field computations. In this respect, U.S. Pat. No. 5,689,452issued to Cameron on Nov. 18, 1997 discloses a programmable digitalcomputer with special-purpose logic units to efficiently perform GaloisField arithmetic. Cameron discloses a method of decoding Reed-Solomoncodes in a large Galois Field GF(2^(n)) in which the finite field isrepresented as a quadratic extension field of one or more subfieldsGF(2^(m)). Basic arithmetic operations in the extension field arewritten solely in terms of operations performed in one or moresubfields. Multiplicative operations performed in GF(2^(n)) use onlyoperations from GF(2^(m)).

[0004] There have also been attempts to efficiently perform Galois Fieldarithmetic on general-purpose wide-word computers. A wide-word computerwith a W-bit word can be looked upon as a SIMD (single instruction,multiple data) computer capable of operating upon one or more sets of koperands, each (W/k) bits wide, simultaneously with a commoninstruction. Computers with such architectures can be used toefficiently perform several computations in parallel and, accordingly,there are potential efficiency advantages that may be exploited.However, existing SIMD architectures are not ideally suited toperforming Galois Field arithmetic as such architectures are not able toeffectively perform operations typically associated with datamanipulations executed when computing Galois Field operations.

[0005] Despite the work referred to above, there are limitationsassociated with existing techniques. Accordingly, a need clearly existsfor a method . . . at least attempt to address these and otherlimitations associated with such techniques.

SUMMARY OF THE INVENTION

[0006] It is recognised that efficient parallel processing of algorithmsinvolving Galois Field arithmetic can be achieved using an appropriatedecomposition into corresponding operations in selected subfields.

[0007] Accordingly, a first aspect of the invention provides a methodfor processing algorithms involving Galois Field arithmetic suitable forexecution by digital hardware able to process k-bit operands. Thisinvolves mapping source arithmetic operations in Galois Field GF(2^(n))into respective sets of corresponding arithmetic operations for aplurality of isomorphic composite Galois Fields GF((2^(p[1]))^(p[2])) .. . ^(p[v])), for each of which π^(v) _(1=l)p[i]=n.

[0008] For each respective set of corresponding operations, a costfunction relating to an implementation of the source arithmeticoperations with the set of corresponding arithmetic operations isevaluated. As a result, one of the sets of corresponding arithmeticoperations is selected as a target set of arithmetic operations, basedon the calculated results of the cost function for each of therespective sets. Further, the source arithmetic operations of GaloisField GF(2^(n)) are converted to the target set of arithmetic operationsof the respective isomorphic composite Galois Field, the targetarithmetic operations having k-bit operands.

[0009] In the described embodiment, the technique of data-slicing isused in combination with the mathematical technique of mappingarithmetic operations of the field GF(2^(n)) in terms of operations inappropriately chosen subfields of GF(2^(n)). Described embodimentsenable Galois Field arithmetic to be effectively executed with SIMDcomputing architectures with relative efficiency and speed. An efficientimplementation for any algorithm with Galois Field arithmetic can bederived where significant data-parallelism exists. Two examples of suchan algorithm are Reed-Solomon decoders (generally described in Lin andCostello, Error Control Coding, Prentice Hall; ISBN: 013283796X, October1982), and the recently selected Rijndael proposal for private key(symmetric key) cryptography.

[0010] Though there are advantages associated with implementing thedescribed method with a data-sliced arrangement, such methods can alsobe executed on existing SIMD or non-SIMD architectures. The describedmethods are not restricted to the preferred Galois Field computerhardware architecture described herein, though there is a clearperformance benefit available as the efficiency of the method depends onthe architecture used.

[0011] The aspects of the invention attempt to provide an efficientimplementation of applications involving Galois Field arithmetic inwhich there is the potential to exploit data parallelism with byperforming relevant calculations with relatively greater computationalefficiency.

BRIEF DESCRIPTION OF THE DRAWINGS

[0012]FIG. 1 is a schematic representation of the steps involved indetermining the parameters for processing algorithms involving GaloisField calculations, in accordance with an embodiment of the presentinvention.

[0013]FIG. 2 is a flowchart illustrating the operations which occur incomputing Galois Field operations in a SIMD architecture, in accordancewith an embodiment of the present invention.

[0014]FIGS. 3.1 to 3.12 are schematic representations of the stepsinvolved in performing a gate circuit implementation of the Rijndaelalgorithm, in accordance with an embodiment of the present invention.

[0015]FIG. 4 is a schematic representation of a computer system able toperform preferred embodiments of the invention.

DETAILED DESCRIPTION OF EMBODIMENTS

[0016] An embodiment of the invention is described in relation to theoperation of computing hardware having a SIMD (single instruction,multiple data) architecture, for the execution of algorithms involvingthe calculation of Galois Field arithmetic. Such algorithms aretypically encountered in communications applications, for examplecommunications receivers which use Reed-Solomon decoders, and incryptography.

[0017] The general operation of an embodiment of the techniques isinitially provided, followed by an example implementation for aparticular algorithm. In this case, for convenience and clarity ofdescription, the algorithm with which the inventive techniques areillustrated in relation to the Rijndael algorithm for private-keycryptography systems.

[0018] Optimizing Galois Field computations usually involvestransformation of the arithmetic involved into appropriate subfieldoperations and the resulting conversion of data between the respectivefield representations. The overall efficiency of the computation dependson the choice of fields and mappings etc. Finding a efficientimplementation for the target hardware involves appropriately choosingthese fields and mappings etc.

[0019] The described embodiment involves a recognition that wide-wordarchitectures are particularly suited to Galois Field arithmetic, as afeature of Galois Fields is that arithmetic operations can be written interms of operations in subfields. Depending upon the choice of subfield,many such mappings are possible. In accordance with embodiments of theinvention a particular mapping (involving a particular subfield) can beidentified as more efficiently implementable than others, depending uponthe computation involved. Thus, arithmetic in any of a range of fields(subfields) can be determined, preferably calculated with relativeefficiency by a programmable general-purpose architecture suitably forprocessing Galois Field arithmetic, as described below. Galois Fieldoperations Many encryption algorithms use exponentiation, which involvesraising a number a (base) to some power e (exponent) mod p. In otherwords, b=a^(e) mod p. Exponentiation is basically repeatedmultiplication (for example, 7⁵=7.7.7.7.7).

[0020] Arithmetic modulo q over polynomials of degree n can also beperformed. This involves, for each term, computing values mod q and alsolimiting the size of the polynomials to degree n. By reducing modulosome irreducible polynomial of degree n+1, the result also forms aGalois Field GF(q^(n)). The elements of the result are polynomials ofdegree (n−1) or lower: a(x)=a_(n−1)x^(n−1)+a_(n−2)x^(n−2)+ . . . +ax+a₀.

[0021] An example of GF(2³) is now provided. In GF(2³) there are 8elements: 0, 1, x, x+1, x², x²+1, x²+x, x²+x+1. To compute the remainderwhen d(x)=x³+x+1 one can simply replace x³ with x+1.

[0022] Addition in GF(q^(n)) involves summing equivalent terms in thepolynomial modulo q. This is simply XOR if q=2 (as it is in binarysystems). In other words, a(x)+b(x)=(a_(n−1)+b_(n−1))x_(n−1)+ . . .+(a₁+b₁)x+(a₀+b₀). Table below provides results of addition in GF(2³).TABLE 1 + 000 001 010 011 100 101 110 111 0 = 000 000 001 010 011 100101 110 111 1 = 001 001 000 011 010 101 100 111 110 x = 010 010 011 000001 110 111 100 101 x + 1 = 011 011 010 001 000 111 110 101 100 x ² =100 100 101 110 111 000 001 010 011 x ² + 1 = 101 101 100 111 110 001000 011 010 x ² + x = 110 110 111 100 101 010 011 000 001 x ² + x + 1 =111 110 101 100 011 010 001 000 111

[0023] Adding polynomials is performed by adding like coefficients,modulo q, which in this case is 2, as is typically the case. Polynomialmultiplication in GF(q^(n)) involves multiplying the two operandpolynomials together. Shifts and XOR operations can be conveniently usedin the case of q=2, when implementing in digital logic. Table 2 providesresults of multiplication in GF(2³). TABLE 2 x 001 010 011 100 101 110111 1 = 001 001 010 011 100 101 110 111 x = 010 010 100 110 011 001 111110 x + 1 = 011 011 110 101 111 100 001 010 x ² = 100 100 011 111 110010 101 001 x ² + 1 = 101 101 001 100 010 111 011 110 x ² + x = 110 110111 001 101 011 010 100 x ² + x + 1 = 111 111 101 010 001 110 100 011

[0024] As an example, consider multiplication in GF(2³), mod x³+x+1:

(x+1).(x+1)=x.(x+1)+1.(x+1)=x ² +x+x+1=x ²+2^(x)+1=x ²+1.

[0025] In a corresponding binary representation:

011.011=011<<1 XOR 011<<0=110 XOR 011=101.

[0026] A further example is given below: $\begin{matrix}{{{{\left( {x^{2} + 1} \right) \cdot \left( {x^{2} + x} \right)}\quad {mod}\quad x^{3}} + x + 1} = {{x^{2} \cdot \left( {x^{2} + x} \right)} + {1 \cdot \left( {x^{2} + x} \right)}}} \\{= {x^{4} + x^{3} + x^{2} + x}} \\{= {{x \cdot \left( {x^{3} + x + 1} \right)} + {1 \cdot \left( {x^{3} + x + 1} \right)} + \left( {x + 1} \right)}} \\{= {x + 1.}}\end{matrix}$

[0027] In a corresponding binary representation: $\begin{matrix}{101.110 = {{110{2\quad {XOR}\quad 110}0} = {11000\quad {XOR}\quad 110}}} \\{= {{11110\quad {mod}\quad 1011} = {{11110\quad {XOR}\quad 1011}1}}} \\{= {{1000\quad {mod}\quad 1011} = {{1000\quad {XOR}\quad 1011} = 011}}}\end{matrix}$

[0028] In summary:

[0029] the operation of addition becomes an XOR operation of the binaryrepresentations,

[0030] eg

(x ²+1)+(x ² +x+1)=x

101 XOR 111=010

[0031] multiplication becomes shift & XOR (ie long multiplication),

[0032] eg

(x+1).(x ²+1)=x.(x ²+1)+1.(x ²+1)=x ³+x+x²+1 mod x ³+x+1=x ²

011.101=(101)<<1 XOR (101)<<0=((1010 mod 1011) XOR (101 mod 1011)) mod1011=001 XOR 101=100

[0033] Addition and multiplication operations performed in accordancewith Galois Field arithmetic are used for performing an embodiment ofthe invention in the context of calculating the Rijndael algorithm.

[0034] Data Transformation for Data-Sliced Operation

[0035] Wide word computing architectures are well-known in the field ofcomputer hardware design. In the context of embodiments describedherein, data parallelism is matched appropriately with available SIMDarchitectural primitives through various data-slicing techniques. Thedisclosure of K. Diefendorff, P. Dubey, R. Hochsprung, and H. Scales,“AltiVec Extension to PowerPC Accelerates Mediaprocessing”, IEEE Micro,March/April 2000, pp. 85-95, the contents of which are herebyincorporated by reference, provides a discussion of these techniques andis useful for implementing the data-slicing techniques that can be usedwith embodiments of the present invention.

[0036] Efficiency of Implementation on SIMD Architectures

[0037] A SIMD computing architecture is desirably used to provide adata-sliced implementation of the described Galois Field computations.In a data-sliced implementation, several independent instances of theunderlying computation are performed in parallel. If the grain of theslicing is k bits, then the first k bits of all operands and results inall machine operations correspond to the first computation instance, thenext k bits to the second instance, and so on. For k=1, this is thefamiliar bit-slicing technique.

[0038] As indicated above, this data slicing technique is beneficialwhere the underlying computation can be performed efficiently on a k-bitcomputer. In the case of computations involving Galois Field arithmetic,such benefits are obtained for several values of k. To do this GF(2^(n))operations are mapped to procedures that use GF(2^(m)) operations forsome m<n, such that those procedures can be efficiently implemented on ak-bit computer.

[0039] Such procedures are used as primitives to design for thecomputation an efficient implementation that targets a k-bit computer.Next, the wide-word computer is used to simultaneously simulate theworking of a number (W/k) of k-bit computers, each performing anindependent instance of the computation, where W is the number of bitsin a word of the wide-word computer. This provides a work-efficientimplementation for a W-bit computer—that is, the implementation averagesk/W times as many cycles as the k-bit computer requires. Of course,there is an initial and final overhead to reorganize the data to andfrom the data sliced form.

[0040] The success and effectiveness of this method requires anefficient implementation of the computation for a k-bit computer. Asindicated above, this can be achieved by mapping GF(2^(n)) operations tosubfield operations. Specifically, GF(2^(n)) operations are performed inan isomorphic composite field, GF(( . . . ((2^(p[1]))^(p[2])) . . .)^(p[v])) where π^(v) _(1=l)p[i]=n. This maps one GF(2^(n)) operation tomore than one GF(2^(p[l])) operations.

[0041] However, these new operations are much more efficient than thecorresponding operation in GF(2^(n)) and the motivation is that theequivalent GF(( . . . ((2^(p[1]))^(p[2])) . . . )^(p[v])) computation ismuch cheaper than the GF(2^(n)) computation.

[0042] Another point of note is that there are many isomorphic fieldspossible for a given decomposition of n into p[i]'s depending on theunderlying field polynomial chosen for each p[i], and the basis chosenfor representation. Thus, selecting the appropriate decomposition of nand underlying field polynomials and basis gives an efficientimplementation of the computation.

[0043] The theory of these relevant mathematical techniques is set outin Chapter 2 of Christof Paar's doctral thesis: Christof Paar, EfficientVLSI Architectures for Bit-Parallel Computation in Galois Fields, PhDThesis, Institute for Experimental Mathematics, University of Essen,Germany, 1994, the contents of which are hereby incorporated byreference. For convenience, a reference to this work is provided athttp://www.ece.wpi.edu/Research/crypt/theses/paar_thesispage.html.Christof Paar's thesis discusses composite fields and how to convertelements from one isomorphic field to another.

[0044] Measures of Efficiency

[0045] In view of the utility of SIMD computers in performing thedescribed Galois Field computations, a consideration of the possibleefficiencies is warranted. A wide-word computer is capable of an amountof computation proportional to the width of its word. For instance, a128-bit computer can do 16 byte XORs in one instruction, while the sametask would take 16 instructions on an 8-bit computer.

[0046] Here, the 128-bit computer works as a SIMD computer, performing16 computations in parallel. However, for more complex computations aspeedup may not be obtainable.

[0047] For example, a lookup of a 256-element table can be performed onan 8-bit computer in a single instruction by using indirection, butusually several table lookups on computers with wider words cannot beperformed simultaneously. In other words, how to generally exploit thefull capability of a wide-word computer is not obvious.

[0048] Before designing, or choosing between, competing implementations,a measure of efficiency is required. For illustration, the followingnotions are used to compare computations running on different targetmachines—the complexity of a computation is defined as the number ofcycles taken, and the work done in a computation is defined ascomplexity×width of the computer's word in bits.

[0049] In the example above, a byte XOR requires 8 units of work on an8-bit computer, while the 128-computer also requires 8 units of work foreach XOR performed, thus achieving equal work. The potential computingpower of a wide-word computer can be fully exploited by devisingwork-efficient computations which can be performed in SIMD fashion onthe wide-word computer.

[0050] Overview

[0051] With reference to FIG. 1, an embodiment of the invention isdescribed for generic algorithms. FIG. 1 illustrates a process combiningdata slicing with performing Galois Field operations in sub-fields toget an efficient SIMD implementation for Galois Field operations. Notethat use of isomorphic composite fields involves:

[0052] Decomposing n into p[i]'s,

[0053] Selecting a field polynomial for each p[i],

[0054] Choosing a basis for representation.

[0055] In FIG. 1, a list of composite fields are each considered in turnat decision step 100. In step 110, if all composite fields have not beenconsidered the next field F is considered. In step 120, a data transformand a corresponding inverse are designed to and from the original fieldto composite field F. For each field F, a number of data slices areconsidered.

[0056] The next data slice of width k is considered in step 130. Foreach tested data slice, a transform and a corresponding inversetransform is designed in step 140 for providing input in data slicedform, and re-arranging from data sliced form after computation in datasliced form. Then, in step 150, W/k data-sliced independent computations(in F) are arranged in SIMD fashion, in accordance with the transformdesigned in step 140. The cost associated with steps 140 and 150 iscalculated in step 160 in accordance with a predetermined cost function,for the data slice of width k.

[0057] Once all data slices are considered for a given F, the data slicek with the lowest total associated cost in step 170. This involvesdetermining the cost associated with step 120, and adding thiscalculated cost to that associated with steps 140 and 150 (as determinedin step 170).

[0058] For a given field F and data slice k, once all composite fieldsare considered the combination with the lowest calculated cost can befinally calculated in step 190 before processing terminates.

[0059] The operations described in FIG. 1 are now explored in greaterdetail in relation to a cost function in which the underlyingcomputation involves finding the multiplicative inverse of 16 GF(2⁸)numbers (in this case, the underlying polynomial is x⁸+x⁴+x³+x+1). Thetarget architecture is the Motorola Altivec or a similar architecture,for which W=128 . The input and the output are stored as consecutivebytes in a 128-bit register.

[0060] As the objective is to obtain a fast Altivec implementation, thecost function of an opeartion θ, denoted by C(θ), is defined as thenumber of instructions for implementing θ on the target Altivecarchitecture.

[0061] In step 110, many composite fields are considered one by one. Foreach such field, various slices are considered in step 130. Thesubsequent cost evaluation in step 160 is illustrated in the followingtwo examples.

[0062] The composite field under consideration is GF((2⁴)²), with theunderlying polynomials x⁴+x+1 and x²+x+w⁷, where w⁴+w+=0. Further, letk=1 be the slice size under consideration.

[0063] C(140), the cost associated with step 140, is taken to be3072*(k/w)=24 instructions. This is so because a method is available tocarry out the corresponding computation in a minimum 3072 instructions.That is, given W/k=128 instances of the input stored in 128 registers, adata-sliced rearrangement of this input can be output in 3072instructions, again stored in 128 registers (note that data-slicingimplies the use of W/k instances of the input). This number ofinstructions is divided by (W/k) as the value of interest (and which issought to be minimimised) is the number of instructions per computation.Accordingly, the cost per computation is the appropriate measure forcost function comparisons.

[0064] The underlying computation in step 150 involves finding theinverses of 16 GF(2⁸) numbers. C(150), the cost associated with of block150, is taken to be 16*137*(k/W). This is because a circuit for step 150has been constructed using 16*137 gates, as later described. In thiscase, the number of gates is taken to be a cost measure because thecomputation in step 150 involves the working of such a gate circuit(since k is 1). 16*137 Altivec instructions can be used to emulate theworking of 128 copies of this circuit in parallel.

[0065] Next, the cost associated with step 120. The transform in step120 is multiplication with following matrix:$\left( \left. \quad\begin{matrix}1 & 0 & 1 & 0 & 0 & 0 & 0 & 0 \\1 & 0 & 1 & 0 & 1 & 1 & 0 & 0 \\1 & 1 & 0 & 1 & 0 & 0 & 1 & 0 \\0 & 1 & 1 & 1 & 0 & 0 & 0 & 0 \\0 & 0 & 0 & 1 & 1 & 0 & 1 & 0 \\0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 \\0 & 1 & 1 & 1 & 0 & 1 & 1 & 0 \\1 & 1 & 0 & 1 & 1 & 1 & 1 & 1\end{matrix} \right) \right.$

[0066] This multiplication can be computed using a circuit with 16*25gates. Since bit-sliced data (k=1) is used, the circuit can be emulatedas in the case of step 150, which similarly gives a cost of 16*25*(k/W)associated with step 120.

[0067] In view of the above, the total cost computed in step 160 is:

F=(3072+16*137+16*25)*(k/W)=5664/128

[0068] In step 170, this total cost F, which corresponds to k=1, iscompared with the relative costs associated with other values of k. Itturns out that the cost for other values of k (not shown in thisillustration) turns out to be higher, and accordingly k=1 is used.

[0069] For further illustration and comparison of cost, the case wherethe composite field is GF((2⁴)²) is now considered, with underlyingpolynomials x⁴+x+1 and x²+x+w¹⁴, where w⁴+w+1=0. As before, this case isillustrated for k=1.

[0070] C(140) is, as before, 3072/128. It turns out that the transformin question does not change with the choice of polynomials.C(150)=16*134/128, since 16*134 is the number of gates in our circuit inthis case (16 repetitions of FIG. 1 with λ=w¹⁴).

[0071] C(120) turns out to be 16*26/128—the matrix in this case forblock 300 is shown below. $\left( \left. \quad\begin{matrix}1 & 0 & 1 & 0 & 0 & 0 & 0 & 0 \\1 & 0 & 1 & 0 & 1 & 1 & 0 & 0 \\1 & 1 & 0 & 1 & 0 & 0 & 1 & 0 \\0 & 1 & 1 & 1 & 0 & 0 & 0 & 0 \\1 & 1 & 0 & 0 & 0 & 1 & 1 & 0 \\0 & 1 & 0 & 1 & 0 & 0 & 1 & 0 \\0 & 0 & 0 & 0 & 1 & 0 & 1 & 0 \\1 & 1 & 0 & 1 & 1 & 1 & 0 & 1\end{matrix} \right) \right.$

[0072] Again, k=1 is chosen in step 170, and the total cost in this caseis 5632/128, which compares favourably to the cost in the previousexample.

[0073] Inverse Calculation Algorithm and Code

[0074] The task of performing embodiments of the invention involvesmapping arithmetic operations of the Galois Field GF(2^(n)) toequivalent operations in appropriately chosen subfields of GF(2^(n)). Acomputer for performing such operations desirably supports thearchitectural features described in detail below. For the purposes ofillustrating a particular example embodiment through, set out directlybelow is a description of calculating the inverse of 16 GF(2⁸) elements(the word width is assumed to be 128 bits for the target hardware onwhich the operations are performed). The decomposition used is fromGF(2⁸) to GF((2⁴)²).

[0075] The process is schematically represented in overview in FIG. 2.

[0076] In step 210, the width of the data slice k is determined, asabove. In step 220, the target field F is determined, also as above.Once these design decisions have been settled, the source arithmeticinstructions are received in step 230. In step 240, these sourceinstructions are transformed to corresponding arithmetic operations inthe target field F, which is determined in step 220.

[0077] In step 250, the input data is transformed to a data-slicedformat having a data width of k bits, which is determined in step 210.At this stage, the arithmetic operations are performed in the targetfield F in step 260. Once completed, the results of these sourceoperations are returned in step 270, having been performed in the targetfield F.

[0078] Inversion in GF((2⁴)²)

[0079] Set out below in Table 3 is code for the inversion architecturefor inversion in GF((2⁴)²). The input and output are labelled by theregisters. The shift operations are assumed to be of the form givenbelow: TABLE 3 /* n is the number of bits to be shifted */ a>Lshift(n,V1, V2) /* V2=V<<n  */ b>Rshift(n, V1, V2) /* V1=V2>>n  */

[0080] A table lookup shown in Table 4 of the following form is alsoassumed: TABLE 4 TBL(tab, V1, V2)  /* V2[i] = tab[V1[i]] , where tabcontains the table */

[0081] Table 5 indicates the tables which are assumed to be stored foraccess as required: TABLE 5 1. Log4 additive to multiplicative formtable for GF(2{circumflex over ( )}4) 2. Alog4 multiplicative toadditive form table for GF(2{circumflex over ( )}4) 3. Inv4 the inversetable for GF(2{circumflex over ( )}4)

[0082] The actual code is given directly below in Table 6. TABLE 6-----------------------<code begin>----------------------- 1. Load(V1,mem_locn, 16) /* load 16 GF(2{circumflex over ( )}8) numbers from theaddress ′mem_locn′ *1 2. TransformD(8, 4, V2, V1) /* V2 contains thecorresponding GF((2{circumflex over ( )}4){circumflex over ( )}2)elements of the GF(2{circumflex over ( )}8) elements of V1 */ 3. Rshift(4 , V3, V2) /* The GF(2{circumflex over ( )}4) numbers in even indiceswill be ignored (indices start from 0) */ 4. Xor(V4, V2, V3) /*V4=V2<bit-xor>V3 */ 5. TBL (Log4 ,V3 ,V3) 5. Addmod(4, V6, V3, V3) /*add elements of V3 to V3 mod (2{circumflex over ( )}4−1) */ /* V6 =Alog4[V3{circumflex over ( )}2]*/ 6. LoadI(V5, 14, 8) /* Load theconstant ′14′ into 16 bytes of V5 */ 7. Addmod(4, V5, V5, V6) /* V5 isthe output of ′Cnst_mult4′ */ 9. TBL (Alog4, V5, V5) 10. TBL(Log4, V4,V4) 11. TBL(Log4, V2, V2) 12. Addmod(4, V2, V2, V4) 13. TBL(Alog4, V2,V2) /* V2 is the output of ′Mult4[1]′ */ 14. XOR(V2, V2, V5) /* V2 isthe output of ′Add4[2]′ */ 15. TBL(Inv4, V2, V2) /* V2 is the output of′Inverse4′ */ 16. TBL (Log4, V2, V2) /* Here V2 contains themultiplicative form of the output of Inverse4 V3 contains themultiplicative form of [i_1. .i_4] V4 contains the multiplicative formof the output of the operation Add4[1] */ 17. Addmod(4, V3, V3, V2) 18.Addmod(4, V2, V2, V4) 19. TBL(Alog4, V3, V3) /* V3 is the output of′Mult4[2]′ */ 20. TBL(Alog4, V2, V2) /* V2 is the output of′Mult4[3]′ */ /* Now transform back to GF(2{circumflex over ( )}8) */21. LoadI(V4, 15, 8) 22. And(V2, V2, V4) /* V2=V2&V4 */ 23. And(V3, V3,V4) 24. Lshift(4, V1, V3) 25. XOR(V1, V1, V2) 26. TransformU(4, 8, V1,V1) /* Convert the GF((2{circumflex over ( )}4){circumflex over ( )}2)elements to GF(2{circumflex over ( )}8) elements */-----------------------<code end>-----------------------

[0083] Implementation of Rijndael Algorithm

[0084] An example implementation is now described of the above method inthe specific context of the Rijndael algorithm. As will be demonstrated,an efficient implementation of Rijndael is obtained, which isconveniently provided in data-sliced form for parallel computation onSIMD computing architectures.

[0085] The described implementation assumes the availability of multipleblocks of input data that can be encrypted in parallel. This is a validassumption if the data blocks are from independent streams. This can beachieved for a single stream of data using a parallelizable encryptionscheme for private key block ciphers (such as, for example, Rijndael)using the techniques described in Charanjit S. Jutla, “Encryption Modeswith Almost Free Message Integrity”, Cryptology ePrint Archive, Report2000/039, 2000 (available at http://eprint.iacr.org/2000/039/), thecontents of which are hereby incorporated by reference. Prior to thisnew scheme, encryption across blocks was forced to be serial in CipherBlock Chaining (CBC) mode and when authentication code (MAC) wasdesired.

[0086] In implementing the algorithm using the described techniques, thefollowing design decisions were made:

[0087] 1. All operations in Rijndael are in GF(2⁸).

[0088] 2. The decomposition of n=8 as {p[1]=4, p[2]=2} is selected.

[0089] 3. The polynomial x⁴+x+1 is chosen as the field polynomial ofGF(2⁴).

[0090] 4. All primitive polynomials of the form P(x)=x²+x+λ (where λ isan element of GF(2⁴)) are considered for p[2]=2. There are four suchpolynomials, which are: λ=w⁷, w¹¹, w¹³, w¹⁴ where w⁴+w+1=0.

[0091] 5. For each P(x), 7 different transformation matrices areobtained (depending on the different basis chosen).

[0092] 6. The cost function of an operation are chosen as gate count ofits gate circuit implementation. 7. The following choices may be made byapplying the method explained in FIG. 1:

[0093] (a) Slice size k is chosen to be 1, as this corresponds with thelowest total cost.

[0094] (b) P(x)x²+x+w¹⁴, where w is the primitive element of GF(2⁴), isthe polynomial selected, which also provides the lowest total cost.

[0095] 1. The following transformation matrix was chosen:$\left( \left. \quad\begin{matrix}1 & 0 & 1 & 0 & 0 & 0 & 0 & 0 \\1 & 0 & 1 & 0 & 1 & 1 & 0 & 0 \\1 & 1 & 0 & 1 & 0 & 0 & 1 & 0 \\0 & 1 & 1 & 1 & 0 & 0 & 0 & 0 \\1 & 1 & 0 & 0 & 0 & 1 & 1 & 0 \\0 & 1 & 0 & 1 & 0 & 0 & 1 & 0 \\0 & 0 & 0 & 0 & 1 & 0 & 1 & 0 \\1 & 1 & 0 & 1 & 1 & 1 & 0 & 1\end{matrix} \right) \right.$

[0096] Gate Circuit Implementation

[0097] The gate circuit implementation of Rijndael is schematicallyillustrated in FIGS. 3.1 to 3.12, as described below with reference tothese drawings. A corresponding software implementation can be obtainedfrom the gate circuit by existing techniques, such as described in EliBiharn, A fast new DES implementation in Software, Technical ReportCS0891, Computer Science Department, Technion—Israel Institute ofTechnology, 1997, the contents of which are hereby incorporated byreference.

[0098]FIGS. 3.1 to 3.12 collectively illustrate the operation of thegate circuit implementation for the Rijndael algorithm. The operation ofthe various blocks in these schematic representations is illustrated bythe digital logic operations given below under the heading “Gate circuitfunctions” for the various blocks included in these drawings.

[0099] Gate Circuit Functions

[0100] The gate circuit of all operations can be given by Booleanexpressions. For the operations schematically represented in the gatecircuit implementation of FIGS. 3.1 to 3.12, the composition of therepresented functions in terms of Boolean logic is given below.

[0101] The operation of the Rijndael algorithm is represented in FIG.3.1 at the broadest level of abstraction. In this case, the number ofrounds is denoted n, and the bytes are stored in column major form. Thefollowing drawings in FIGS. 3.2 to 3.12 sucessively define operations ofFIG. 3.1 in terms of suboperations. Operations that are not defined interms of suboperations are defined below in terms of digital logicexpressions.

[0102] The Rijndael-impl block in FIG. 3.1 is represented in furtherdetail in FIG. 3.2. Though the loop is shown unrolled in this case, itwould of course be realised as a loop in an actual hardwareimplementation.

[0103]FIG. 3.3A represents the function of the Round, block in FIG. 3.2,for 1≦i<n, while FIG. 3.3B represents the Round, block in FIG. 3.2.

[0104]FIG. 3.4 represents the Byte_Sub operation of the Round operationof FIGS. 3.3A and 3.3B, while FIG. 3.5 represents the Shift_Rowoperation of the Round operation of FIGS. 3.3A and 3.3B. Shift_Row doesnot require any gates for implementation. The transform in this case isfor a key length of 128 bits. FIG. 3.6 represents the Add_Round_Keyoperation of the Round operation of FIGS. 3.3A and 3.3B. FIG. 3.7represents the Mix_Column operation of the Round operation of FIG. 3.3A.

[0105]FIG. 3.8 represents the Transform operation of FIGS. 3.1 and 3.2.Similarly, FIG. 3.9 represents the Inverse_Transform operation of FIG.3.1.

[0106]FIG. 3.10 represents the Inverse8 operation of the Byte_Suboperation shown in FIG. 3.4.

[0107]FIG. 3.11 represents the Linear_Comb operation of the Mixed_Columnoperation of FIG. 3.7. All data paths are 8 bits wide.

[0108]FIG. 3.12 represents the H(03).x and H(02).x operations of theLinear_Comb operation in FIG. 3.11. All data paths are 4 bits wide.

[0109] For operations in FIGS. 3.1 to 3.12 that are not otherwise definemost specifically in terms of other operations, digital logicimplementations are provided below under a designated heading for eachsuch operation.

[0110] In the notation used below, logical operations are denoted asindicated. When an operation has 2 operands, then the operands arereferred by the symbols ‘a’ and ‘b’. The requested functions are:

[0111] ‘^ ’ denotes the XOR operation,

[0112] ‘&’ denotes the AND operation,

[0113] ‘!’ denotes the NOT operation,

[0114] i[j] is used to represent i_(j),

[0115] o[j] is used to represent o_(j), • ’{circumflex over ( )}'denotes the XOR operation, • ’&' denotes the AND operation, • ’!'denotes the NOT operation, • i[j] is used to represent i_(j), • o[j] isused to represent o_(j), GF8toGF4 operation: gate[0]  = i[1] {circumflexover ( )} i[3]; gate[1]  = i[1] {circumflex over ( )} i[6]; gate[2]  =i[2] {circumflex over ( )} i[4]; gate[3]  = i[2] {circumflex over ( )}i[7]; gate[4]  = i[5] {circumflex over ( )} i[7]; gate[5]  = gate[1]{circumflex over ( )} i[5]; gate[6]  = gate[2] {circumflex over ( )}i[7]; gate[7]  = gate[2] {circumflex over ( )} i[3]; gate[8]  = gate[1]{circumflex over ( )} gate[3]; gate[9]  = gate[2] {circumflex over ( )}i[8]; gate[10] = gate[5] {circumflex over ( )} i[3]; gate[11] = gate[6]{circumflex over ( )} i[1]; gate[12] = gate[5] {circumflex over ( )}gate[9]; o[1] =gate[0]; o[2] =gate[10]; o[3] =gate[11]; o[4] =gate[7];o[5] =gate[8]; o[6] =gate[6]; o[7] =gate[4]; o[8] =gate[12]; GF4toGF8operation gate[0]  = i[1] {circumflex over ( )} i[3]; gate[1]  = i[2]{circumflex over ( )} i[4]; gate[2]  = i[3] {circumflex over ( )} i[6];gate[3]  = i[5] {circumflex over ( )} i[7]; gate[4]  = i[3] {circumflexover ( )} i[7]; gate[5]  = gate[0] {circumflex over ( )} i[4]; gate[6] = i[2] {circumflex over ( )} gate[3]; gate[7]  = gate[1] {circumflexover ( )} gate[3]; gate[8]  = gate[0] {circumflex over ( )} i[6];gate[9]  = gate[1] {circumflex over ( )} gate[4]; gate[10] = gate[1]{circumflex over ( )} i[8]; gate[11] = gate[6] {circumflex over ( )}gate[8]; gate[12] = gate[5] {circumflex over ( )} i[7]; o[1] = gate[2];o[2] = gate[11]; o[3] = gate[8]; o[4] = gate[7]; o[5] = gate[12]; o[6] =gate[9]; o[7] = gate[5]; o[8] = gate[10]; Square4 operation gate[0] =i[1] {circumflex over ( )} i[3]; gate[1] = i[2] {circumflex over ( )}i[4]; o[1] = i[1]; o[2] = gate[0]; o[3] = i[2]; o[4] = gate[1]; Add4operation gate [1] = a [1] {circumflex over ( )} b [1]; gate [2] = a [2]{circumflex over ( )} b [2]; gate [3] = a [3] {circumflex over ( )} b[3]; gate [4] = a [4] {circumflex over ( )} b [4]; O [1] = gate [1]; O[2] = gate [2]; O [3] = gate [3]; O [4] = gate [4]; Add8 operation gate[1] = a [1] {circumflex over ( )} b [1]; gate [2] = a [2] {circumflexover ( )} b [2]; gate [3] = a [3] {circumflex over ( )} b [3]; gate [4]= a [4] {circumflex over ( )} b [4]; gate [5] = a [5] {circumflex over( )} b [5]; gate [6] = a [6] {circumflex over ( )} b [6]; gate [7] = a[7] {circumflex over ( )} b [7]; gate [8] = a [8] {circumflex over ( )}b [8]; O [1] = gate [1]; O [2] = gate [2]; O [3] = gate [3]; O [4] =gate [4]; O [5] = gate [5]; O [6] = gate [6]; O [7] = gate [7]; O [8] =gate [8]; Mult4 operation gate [0]  = a [3] {circumflex over ( )} a [2];gate [1]  = a [2] {circumflex over ( )} a [1]; gate [2]  = a [1]{circumflex over ( )} a [4]; gate [3]  = gate [2] & b [1]; gate [4]  = a[3] & b [2]; gate [5]  = a [2] & b [3]; gate [6]  = a [1] & b [4]; gate[7]  = gate [1] & b [1]; gate [8]  = gate [2] & b [2]; gate [9]  = a [3]& b [3]; gate [10] = a [2] & b [4]; gate [11] = gate [0] & b [11; gate[12] = gate [1] & b [2]; gate [13] = gate [2] & b [3]; gate [14] = a [3]& b [4]; gate [15] = a [3] & b [1]; gate [16] = a [2] & b [2]; gate [17]= a [1] & b [3]; gate [18] = a [4] & b [4]; gate [19] = gate [3]{circumflex over ( )} gate [4]; gate [20] = gate [5] {circumflex over( )} gate [6]; gate [21] = gate [7] {circumflex over ( )} gate [8]; gate[22] = gate [9] {circumflex over ( )} gate [10]; gate [23] = gate [11]{circumflex over ( )} gate [12]; gate [24] = gate [13] {circumflex over( )} gate [14]; gate [25] = gate [15] {circumflex over ( )} gate [16];gate [26] = gate [17] {circumflex over ( )} gate [18]; gate [27] = gate[19] {circumflex over ( )} gate [20]; gate [28] = gate [21] {circumflexover ( )} gate [22]; gate [29] = gate [23] {circumflex over ( )} gate[24]; gate [30] = gate [25] {circumflex over ( )} gate [26]; o [1] =gate [27]; o [2] = gate [28]; o [3] = gate [29]; o [4] = gate [30];Inverse4 operation gate [0]  = !i [4]; gate [1]  = !i [2]; gate [2]  = i[2]; {circumflex over ( )} i [1]; gate [3]  = i [4]; {circumflex over( )} i [3]; gate [4]  = i [3]; & i [2]; gate [5]  = i [4] {circumflexover ( )} i [1]; gate [6]  = i [3] {circumflex over ( )} i [2]; gate[7]  = i [4] & i [3]; gate [8]  = i [4] & i [2]; gate [9]  = gate [3] &gate [1]; gate [10] = gate [4] & gate [5]; gate [11] = i [4] & gate [6];gate [12] = gate [2] & i [3]; gate [13] = !gate [7]; gate [14] = gate[8] & i [1]; gate [15] = gate [2] & gate [0]; gate [16] = i [3] & i [1];gate [17] = gate [2] {circumflex over ( )} gate [9]; gate [18] = gate[11] {circumflex over ( )} gate [12]; gate [19] = gate [13] & i [1];gate [20] = gate [7] {circumflex over ( )} gate [14]; gate [21] = gate[16] & gate [1]; gate [22] = gate [6] {circumflex over ( )} gate [21];gate [23] = gate [17] {circumflex over ( )} gate [10]; gate [24] = gate[18] {circumflex over ( )} gate [19]; gate [25] = gate [20] {circumflexover ( )} gate [15]; gate [27] = i [4] {circumflex over ( )} i [2]; gate[28] = !gate [27]; gate [29] = gate [28] & i [1]; gate [26] = gate [29]{circumflex over ( )} gate [22]; o [4]= gate [23]; o [3]= gate [24]; o[2]= gate [25]; o [1]= gate [26]; Cnst_mult14 operation gate [0] =i [4]{circumflex over ( )} i [3]; o [4]= gate [0]; o [3]= i [2]; o [2]= i[1]; o [1]= i [4]; Cnst_mult1 operation gate [0] = i [4] {circumflexover ( )} i [1]; o [4] = i [1]; o [3] = gate [0]; o [2] = i [3]; o [1] =i [2]; Cnst_mult11 operation gate [0] = i [4] {circumflex over ( )} i[3]; gate [1] = i [2] {circumflex over ( )} i [1]; gate [2] = i [3]{circumflex over ( )} gate [1]; gate [3] = gate [0] {circumflex over( )} i [2]; gate [4] = gate [0] {circumflex over ( )} gate [1]; o [4] =gate [2]; o [3] = gate [0]; o [2] = gate [3]; o [1] = gate [4]; Cnstmult12 operation gate [0] = i [4] {circumflex over ( )} i [3]; gate [1]= i [2] {circumflex over ( )} gate [0]; gate [2] = gate [1] {circumflexover ( )} i [1]; o [4] = gate [2]; o [3] = i [4]; o [2] = gate [0]; o[1] = gate [1]; Affine operation gate [0] = i [1] {circumflex over ( )}i [7]; gate [1] = i [3] {circumflex over ( )} i [6]; gate [2] = i [4]{circumflex over ( )} i [6]; gate [3] = i [2] {circumflex over ( )} gate[2]; gate [4] = gate [0] {circumflex over ( )} i [3]; gate [5] = gate[1] {circumflex over ( )} i [8]; gate [6] = gate [0] {circumflex over( )} i [5]; gate [7] = !i [5]; gate [8] = !gate [3]; gate [9] = !gate[6]; o [1] = gate [7]; o [2] = gate [8]; o [3] = gate [4]; o [4] = gate[5]; o [5] = i [1]; o [6] = gate [1]; o [7] = gate [9]; o [8] = gate[2];

[0116] Proposed Computing Architecture

[0117] A computing architecture for a computer designed to supportefficient processing of Galois Field arithmetic executed in accordancewith the described embodiment is now described. Such an architecturedesirably includes the following architectural features listed below.

[0118] Load/Store: from memory to a set of processor registers

[0119] Common logical operations: such as, OR, and, XOR, inter-intraelement rotate, etc.

[0120] SIMD mode: in which, any operational primitive is supported withan explicit or implicit Galois Field width. For example, consider a128-bit datapath wide SIMD architecture. A primitive such as, Add4 V1,V2, V3, could mean the following: “Add the elements of Registers V1 withthose of register V2, and store the result in register V3; assumeoperands to be elements of GF(2⁴)”. In other words, 32 elements, eachnibble wide are added.

[0121] Table-lookup support: In implicit form such support can be foundin existing techniques, such as through ‘permute’ primitive in K.Diefendorff, P. Dubey, R. Hochsprung, and H. Scales, “AltiVec Extensionto PowerPC Accelerates Mediaprocessing”, IEEE Micro, March/April 2000,pp. 85-95. However, this architecture support is proposed in explicitform in the described embodiment and can be designed efficiently usingexisting techniques, such as those used in the implementation ofarchitecture described in the reference above.

[0122] The above features are desirable in computing environments whichimplement the techniques described above. An implementation includingthe above architectural features provides an efficient platform for thetechniques described above for executing algorithms involving GaloisField arithmetic.

[0123] Computer Hardware

[0124] A computer system 400, schematically represented in FIG. 4, isprovided with the computing architectural features outlined directlyabove. Preferably, such a computer system 1000 is used to execute GaloisField operations as described. However, as noted above, embodiments ofthe invention can be implemented using any conventional SIMDarchitecture, or indeed any existing general purpose (for example,non-SIMD) computing architecture. The process described above can beimplemented as software, or computer readable program code, executing onthe computer system 400.

[0125] The computer system 400 includes a computer 450, a video display410, and input devices 430, 432. In addition, the computer system 400can have any of a number of other output devices including lineprinters, laser printers, plotters, and other reproduction devicesconnected to the computer 450. The computer system 400 can be connectedto one or more other computers via a communication input/output (I/O)interface 464 using an appropriate communication channel 440 such as amodem communications path, an electronic network, or the like. Thenetwork may include a local area network (LAN), a wide area network(WAN), an Intranet, and/or the Internet 420.

[0126] The computer 450 includes the control module 466, a memory 470that may include random access memory (RAM) and read-only memory (ROM),input/output (I/O) interfaces 464, 472, a video interface 460, and oneor more storage devices generally represented by the storage device 462.The control module 466 is implemented using a central processing unit(CPU) that executes or runs a computer readable program code thatperforms a particular function or related set of functions.

[0127] The video interface 460 is connected to the video display 410 andprovides video signals from the computer 450 for display on the videodisplay 410. User input to operate the computer 450 can be provided byone or more of the input devices 430, 432 via the I/O interface 472. Forexample, a user of the computer 450 can use a keyboard as I/O interface430 and/or a pointing device such as a mouse as I/O interface 432. Thekeyboard and the mouse provide input to the computer 450. The storagedevice 462 can consist of one or more of the following: a floppy disk, ahard disk drive, a magneto-optical disk drive, CD-ROM, magnetic tape orany other of a number of non-volatile storage devices well known tothose skilled in the art. Each of the elements in the computer system450 is typically connected to other devices via a bus 480 that in turncan consist of data, address, and control buses.

[0128] The method steps are effected by instructions in the softwarethat are carried out by the computer system 400. Again, the software maybe implemented as one or more modules for implementing the method steps.

[0129] In particular, the software may be stored in a computer readablemedium, including the storage device 462 or that is downloaded from aremote location via the interface 464 and communications channel 440from the Internet 420 or another network location or site. The computersystem 400 includes the computer readable medium having such software orprogram code recorded such that instructions of the software or theprogram code can be carried out. The use of the computer system 400preferably effects advantageous apparatuses for processing algorithmsinvolving Galois Field arithmetic.

[0130] The computer system 400 is provided for illustrative purposes andother configurations can be employed without departing from the scopeand spirit of the invention. The foregoing is merely an example of thetypes of computers or computer systems with which the embodiments of theinvention may be practised. Typically, the processes of the embodimentsare resident as software or a computer readable program code recorded ona hard disk drive as the computer readable medium, and read andcontrolled using the control module 466. Intermediate storage of theprogram code and any data including entities, tickets, and the like maybe accomplished using the memory 470, possibly in concert with thestorage device 462.

[0131] In some instances, the program may be supplied to the userencoded on a CD-ROM or a floppy disk (both generally depicted by thestorage device 462), or alternatively could be read by the user from thenetwork via a modem device connected to the computer 450. Still further,the computer system 400 can load the software from other computerreadable media. This may include magnetic tape, a ROM or integratedcircuit, a magneto-optical disk, a radio or infra-red transmissionchannel between the computer and another device, a computer readablecard such as a PCMCIA card, and the Internet 420 and Intranets includingemail transmissions and information recorded on Internet sites and thelike. The foregoing are merely examples of relevant computer readablemedia. Other computer readable media may be practised without departingfrom the scope and spirit of the invention.

[0132] Further to the above, the described methods can be realised in acentralised fashion in one computer system 400, or in a distributedfashion where different elements are spread across severalinterconnected computer systems.

[0133] Computer program means or computer program in the present contextmean any expression, in any language, code or notation, of a set ofinstructions intended to cause a system having an information processingcapability to perform a particular function either directly or aftereither or both of the following: a) conversion to another language, codeor notation or b) reproduction in a different material form.

[0134] It is to be understood that the invention is not limited to theembodiment described, but that various alterations and modifications, aswould be apparent to one skilled in the art, are included within thescope of the invention.

We claim:
 1. A method of processing calculations for algorithmsinvolving Galois Field arithmetic, the method comprising the steps of:mapping one or more source arithmetic operations in Galois FieldGF(2^(n)) into sets of corresponding arithmetic operations for aplurality of respective isomorphic composite Galois FieldsGF((2^(p[1]))^(p[2])) . . . )^(p[v])), for one or more of each uniquedecomposition of n into p[i]s such that π^(v) _(i=1)p[i]=n; evaluating,for each respective set of corresponding operations, a field costfunction relating to an implementation of the source arithmeticoperations with the set of corresponding arithmetic operations; andselecting one of the sets of corresponding arithmetic operations as atarget set of arithmetic operations, based on calculated results of anaggregate cost function based on the field cost function for each of therespective sets.
 2. The method as claimed in claim 1, further comprisingthe steps of: determining a data transformation for arranging dataoperands of said one or more source arithmetic operations intodata-sliced format having k-bit operands, for the respective sets ofcorresponding arithmetic operations; and evaluating, for each respectiveset of corresponding operations, a data cost function relating to saiddata transformation; and calculating the aggregate cost function as asum of the data cost function and the field cost function.
 3. The methodas claimed in claim 1, further comprising the step of: simultaneouslyexecuting W/k of said target set of corresponding arithmetic operationsfor k-bit operands on W-bit digital computer hardware; wherein theresults of said arithmetic operations in Galois Field GF(2^(n)) areobtained in k/W as many cycles of the W-bit computer compared withexecution of the corresponding operations on a k-bit computer.
 4. Themethod as claimed in claim 1, wherein the aggregate cost function isrepresentative of the relative computational efficiency of performingthe source arithmetic operations as a set of corresponding arithmeticoperations in a respective isomorphic composite Galois Field.
 5. Themethod as claimed in claim 1, wherein the cost function isrepresentative of the hardware design efficiency of performing thesource arithmetic operations as a set of corresponding arithmeticoperations in a respective isomorphic composite Galois Field.
 6. Themethod as claimed in claim 1, wherein the cost function isrepresentative of the number of gates required in a gate circuitimplementation of the source arithmetic operations as a set ofcorresponding arithmetic operations in a respective isomorphic compositeGalois Field.
 7. The method as claimed in claim 1, wherein the targetset having the lowest associated result of the aggregate cost functionis selected from the sets of corresponding arithmetic operations.
 8. Themethod of processing calculations for algorithms involving Galois Fieldarithmetic suitable, the method comprising steps of: mapping a sourceset of one or more source arithmetic operations in Galois FieldGF(2^(n)) into a target set of corresponding arithmetic operations foran identified isomorphic composite Galois Field GF((2^(p[1]))^(p[2])) .. . ^(p[v])), for which π^(v) _(i=1)p[i]=n; performing saidcorresponding arithmetic operations comprising said target set; andobtaining the results of said source arithmetic operations comprisingsaid source set, based upon the results of said corresponding arithmeticoperations comprising said target set; wherein said identifiedisomorphic composite Galois Field GF((2^(p[1]))^(p[2])) . . . )^(p[v]))has been selected from a plurality of such isomorphic composite GaloisFields which each represent a unique decomposition of n into p[i]s suchthat π^(v) _(i=1)p[i]=n.
 9. The method as claimed in claim 8, whereinsaid selection of the identified isomorphic composite Galois Field isperformed by steps of: mapping one or more source arithmetic operationsin Galois Field GF(2^(n)) into respective sets of correspondingarithmetic operations for a plurality of isomorphic composite GaloisFields GF((2^(p[1]))^(p[2])) . . . )^(p[v])) for unique decompositionsof n into a set of p[i]s such that π^(v) _(i=1)p[i]=n; evaluating, foreach respective set of corresponding operations, a field cost functionrelating to an implementation of the source arithmetic operations withthe set of corresponding arithmetic operations; and selecting one of thesets of corresponding arithmetic operations as a target set ofarithmetic operations, based on calculated results of an aggregate costfunction based on the field cost function for each of the respectivesets.
 10. The method as claimed in claim 9, further comprising the stepsof: determining a data transformation for arranging data operands ofsaid one or more source arithmetic operations into data-sliced formathaving k-bit operands, for the respective sets of correspondingarithmetic operations; and evaluating, for each respective set ofcorresponding operations, a data cost function relating to said datatransformation; and calculating the aggregate cost function as a sum ofthe data cost function and the field cost function.
 11. The method asclaimed in claim 8, further comprising the step of: simultaneouslyexecuting W/k of said target set of corresponding arithmetic operationsfor k-bit operands on W-bit digital computer hardware; wherein theresults of said arithmetic operations in Galois Field GF(2^(n)) areobtained in k/W as many cycles of the W-bit computer compared withexecution of the corresponding operations on a k-bit computer.
 12. Themethod as claimed in claim 9, wherein the aggregate cost function isrepresentative of the relative computational efficiency of performingthe source arithmetic operations as a set of corresponding arithmeticoperations in a respective isomorphic composite Galois Field.
 13. Themethod as claimed in claim 9, wherein the cost function isrepresentative of the hardware design efficiency of performing thesource arithmetic operations as a set of corresponding arithmeticoperations in a respective isomorphic composite Galois Field.
 14. Themethod as claimed in claim 9, wherein the field cost function isrepresentative of the number of gates required in a gate circuitimplementation of the source arithmetic operations as a set ofcorresponding arithmetic operations in a respective isomorphic compositeGalois Field.
 15. The method as claimed in claim 9, wherein the targetset having the lowest associated result of the aggregate cost functionis selected from the sets of corresponding arithmetic operations. 16.The method as claimed in claim 8, wherein the algorithm is the Rijndaelalgorithm.
 17. The method as claimed in claim 16, wherein n is 8 suchthat the arithmetic operations for the Rijndael algorithm are in GaloisField GF(2⁸).
 18. The method as claimed in claim 17, wherein theisomorphic composite Galois Field is GF(2⁴)²) in which p[1] is 4 andp[2] is
 2. 19. The method as claimed in claim 18, wherein for theisomorphic composite Galois Field is GF((2⁴)²), p[1] has a correspondingfield polynomial of x⁴+x+1 and p[2] has a corresponding field polynomialof x²+x+W¹⁴ for which w⁴+w+1=0.
 20. The method as claimed in claim 19,wherein W of said target set of corresponding arithmetic operations areexecuted in parallel using W-bit digital computer hardware for 1-bitoperands.
 21. An apparatus for processing calculations for algorithmsinvolving Galois Field arithmetic suitable, the apparatus comprising:means for mapping a source set of one or more source arithmeticoperations in Galois Field GF(2^(n)) into a target set of correspondingarithmetic operations for an identified isomorphic composite GaloisField GF((2^(p[1]))^(p[2])) . . . ^(p[v])), for which π^(v)_(i=1)p[i]=n; means for performing said corresponding arithmeticoperations comprising said target set; and means for obtaining theresults of said source arithmetic operations comprising said source set,based upon the results of said corresponding arithmetic operationscomprising said target set; wherein said identified isomorphic compositeGalois Field GF((2^(p[1]))^(p[2])) . . .^(p[v])) has been selected froma plurality of such isomorphic composite Galois Fields which eachrepresent a unique decomposition of n into p[i]s such that π^(v)_(i=1)p[i]=n.
 22. A computer program for processing calculations foralgorithms involving Galois Field arithmetic suitable, the computerprogram comprising: code means for mapping a source set of one or moresource arithmetic operations in Galois Field GF(2^(n)) into a target setof corresponding arithmetic operations for an identified isomorphiccomposite Galois Field GF((2^(p[1]))^(p[2])) . . . ^(p[v])), for whichπ^(v) _(i=1)p[i]=n; code means for performing said correspondingarithmetic operations comprising said target set; and code means forobtaining the results of said source arithmetic operations comprisingsaid source set, based upon the results of said corresponding arithmeticoperations comprising said target set; wherein said identifiedisomorphic composite Galois Field GF((2^(p[1]))^(p[2])) . . . ^(p[v]))has been selected from a plurality of such isomorphic composite GaloisFields which each represent a unique decomposition of n into p[i]s suchthat π^(v) _(i=1)p[i]=n.